Spectral Thompson Sampling
نویسندگان
چکیده
Thompson Sampling (TS) has surged a lot of interest due to its good empirical performance, in particular in the computational advertising. Though successful, the tools for its performance analysis appeared only recently. In this paper, we describe and analyze SpectralTS algorithm for a bandit problem, where the payoffs of the choices are smooth given an underlying graph. In this setting, each choice is a node of a graph and the expected payoffs of the neighboring nodes are assumed to be similar. Although the setting has application both in recommender systems and advertising, the traditional algorithms would scale poorly with the number of choices. For that purpose we consider an effective dimension d, which is small in real-world graphs. We deliver the analysis showing that the regret of SpectralTS scales as d √ T lnN with high probability, where T is the time horizon and N is the number of choices. Since a d √ T lnN regret is comparable to the known results, SpectralTS offers a computationally more efficient alternative. We also show that our algorithm is competitive on both synthetic and real-world data.
منابع مشابه
Horvitz-Thompson estimator of population mean under inverse sampling designs
Inverse sampling design is generally considered to be appropriate technique when the population is divided into two subpopulations, one of which contains only few units. In this paper, we derive the Horvitz-Thompson estimator for the population mean under inverse sampling designs, where subpopulation sizes are known. We then introduce an alternative unbiased estimator, corresponding to post-st...
متن کاملCompressive and Noncompressive Power Spectral Density Estimation from Periodic Nonuniform Samples
This paper presents a novel power spectral density estimation technique for band-limited, wide-sense stationary signals from sub-Nyquist sampled data. The technique employs multicoset sampling and incorporates the advantages of compressed sensing (CS) when the power spectrum is sparse, but applies to sparse and nonsparse power spectra alike. The estimates are consistent piecewise constant appro...
متن کاملThompson sampling with the online bootstrap
Thompson sampling provides a solution to bandit problems in which new observations are allocated to arms with the posterior probability that an arm is optimal. While sometimes easy to implement and asymptotically optimal, Thompson sampling can be computationally demanding in large scale bandit problems, and its performance is dependent on the model fit to the observed data. We introduce bootstr...
متن کاملA Note on Information-Directed Sampling and Thompson Sampling
This note introduce three Bayesian style Multi-armed bandit algorithms: Information-directed sampling, Thompson Sampling and Generalized Thompson Sampling. The goal is to give an intuitive explanation for these three algorithms and their regret bounds, and provide some derivations that are omitted in the original papers.
متن کاملThompson Sampling for Multi-Objective Multi-Armed Bandits Problem
The multi-objective multi-armed bandit (MOMAB) problem is a sequential decision process with stochastic rewards. Each arm generates a vector of rewards instead of a single scalar reward. Moreover, these multiple rewards might be conflicting. The MOMAB-problem has a set of Pareto optimal arms and an agent’s goal is not only to find that set but also to play evenly or fairly the arms in that set....
متن کامل